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Abstract: Many applications in the field of statistics require Markov chain 
Monte Carlo methods. Determining appropriate starting values and run 
lengths can be both analytically and empirically challenging. A desire to 
overcome these problems has led to the development of exact, or perfect, 
sampling algorithms which convert a Markov chain into an algorithm that 
produces i.i.d. samples from the stationary distribution. Unfortunately, very 
few of these algorithms have been developed for the distributions that arise 
in statistical applications, which typically have uncountable support. Here 
we study an exact sampling algorithm using a geometrically ergodic Markov 
chain on a general state space. Our work provides a significant reduction to 
the number of input draws necessary for the Bernoulli factory, which enables 
exact sampling via a rejection sampling approach. We illustrate the algo- 
rithm on a univariate Metropolis-Hastings sampler and a bivariate Gibbs 
sampler, which provide a proof of concept and insight into hyper-parameter 
selection. Finally, we illustrate the algorithm on a Bayesian version of the 
one-way random effects model with data from a styrene exposure study. 
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1. Introduction 

Suppose we want to explore a probability distribution tt defined on X. Further 
suppose TT is intractable in the sense that direct (i.i.d.) sampling is unavailable. 
In this setting, the Markov chain Monte Carlo (MCMC) method can be a useful 
tool since it is often straightforward to construct and simulate an ergodic Markov 
chain that has tt as its stationary distribution (Chen et al., 2000; Liu, 2001; 
Robert and Casella, 1999). The two main drawbacks of MCMC relative to direct 
sampling from tt are (i) the difficulty in ascertaining how long the Markov chain 
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needs to be run before it gets "close" to n (see, e.g., Jones and Hobert, 2001), and 
(ii) the difRculty in deriving and calculating asymptotically valid standard errors 
for the ergodic averages that are used to approximate intractable expectations 
under tt (see, e.g., Flegal et al., 2008). 

A desire to overcome these problems has led to the development of clever 
techniques using a Markov chain to create an algorithm that produces i.i.d. 
draws from the stationary distribution (e.g., Craiu and Mcng, 2011; Green and 
Murdoch, 1999; Hubcr, 2004; Propp and Wilson, 1996; Wilson, 2000). Unfortu- 
nately, very few of these so-called perfect sampling algorithms have been devel- 
oped for the distributions that arise in realistic statistical applications, which 
typically have uncountable support. Asmussen et al. (1992) and Blanchet and 
Meng (2005) provide one such algorithm applicable to Markov chains on gen- 
eral state spaces. The main assumption necessary is that the chain satisfies a 
one-step minorization condition. As we describe later, under this condition the 
stationary distribution admits a mixture representation, suggesting the follow- 
ing two-step sampling approach: sample the discrete distribution corresponding 
to the mixture weights, then sample the selected mixture component. 

This approach has never been successfully implemented, however, it has been 
used to obtain approximate draws from tt, see for example Blanchet and Thomas 
(2007), Hobert et al. (2006) and Hobert and Robert (2004). The difficult part 
is drawing from the discrete distribution corresponding to the mixture weights, 
which is done via a rejection sampling approach. In this paper, we provide so- 
lutions to a number of practical problems and illustrate the algorithm on three 
examples. This requires overcoming two challenges: (i) obtaining a dominating 
proposal distribution and (ii) generating a Bernoulli variate to decide whether 
a proposed draw is accepted or not. While (ii) might seem trivial, the chal- 
lenge is that we are unable to (exactly) compute the success probability for this 
Bernoulli variate. 

A solution to (i) requires identification of a bounding (proposal) probability 
mass function for the target mass function. Previously, Blanchet and Meng 
(2005) proposed an upper bound on moments associated with the target mass 
function. Blanchet and Thomas (2007) used output from a preliminary run of 
the Markov chain to construct an approximate upper bound. We provide an 
explicit bound for Markov chains satisfying a geometric drift condition using 
results from Roberts and Tweedie (1999). 

A solution to (ii) will determine whether to accept a proposed draw. The 
decision is made by generating a Bernoulli random variable with success prob- 
ability that involves the ratio between the target and proposal mass functions. 
In our case, the target mass function is unknown, apparently making this step 
impossible. However, one can still generate such a Bernoulli variate, using a so- 
called Bernoulli factory (Keane and O'Brien, 1994). Briefly, a Bernoulli factory 
is an algorithm that outputs a Bernoulli variate with success probability /(p), 
from i.i.d. Bernoulli(p) variates, when / is known but p is unknown. 

The rejection sampling approach requires a Bernoulli factory algorithm for 
f{p) = ap and a G (1, oo). Nacu and Peres (2005) and Latuszynski et al. (2011) 
provide an algorithm when a = 2, but their algorithms are computationally 
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demanding and scale poorly for a G (1,cxd). For example when p e (0,.4), 
one requires at least 65, 536 Bernoulli(p) random variables to generate one 
Bernoulli(2p) variatc. In this paper, we provide an algorithm for any a € (l,oo) 
that reduces the computational time substantially. For example whenp g (0, .4), 
we can obtain a Bernoulli(2p) variate with only 256 Bernoulli(p) random vari- 
ables. This is an important reduction because the Bernoulli factory accounts 
for much of the computational time in the exact sampling algorithm. Section 3 
contains a full description the Bernoulli factory and our modification. 

Our solutions to (i) and (ii) yield an exact sampling algorithm for tt. The 
algorithm is suitable even for intractable distributions on general state spaces: 
that is, for distributions that typically arise in statistical applications. This is 
an important extension, since very few existing algorithms apply to general 
state spaces, but it is limited in the sense that one must be able to establish 
a drift and associated minorization condition for the underlying Markov chain. 
The current algorithm can be computationally demanding, however we have 
successfully implemented it in three examples. 

Our first example considers a univariate Metropolis-Hastings sampler for 
which we obtain 1000 i.i.d. draws. The second example considers a slightly 
more complicated bivariate Gibbs sampler where we again obtain 1000 i.i.d. 
draws. These two examples could be considered toy examples in the sense that 
i.i.d. observations are available for each. However, they provide insights into the 
performance and hyper-parameter selection of the algorithm. 

Our final example considers a Bayesian version of the classical one-way ran- 
dom effects model that is widely used to analyze data. We illustrate the exact 
sampling algorithm, using data from a styrene exposure study, to obtain 20 
i.i.d. draws. This is the first successful implementation of an exact sampling 
algorithm for a model of this type. Our analysis considers a balanced design 
and requires development of a suitable drift condition, which improves upon the 
existing drift constants of Tan and Hobcrt (2009). 

These examples give hope for exact sampling algorithms for general state 
space Markov chains in more complicated settings. Even if we are unable to 
obtain multiple draws in these settings, a single exact draw will alleviate the 
need for burn-in entirely. 

The rest of this paper is organized as follows. Section 2 provides the mixture 
representation of tt, details the rejection sampling approach and bounds the 
tail probabilities of the proposal distribution. Section 3 introduces the Bernoulli 
factory and proposes a new target function that speeds up the algorithm sig- 
nificantly. Section 4 gives the full exact sampling algorithm. Sections 5 and 6 
implement the algorithm for two toy examples and a Bayesian version of a 
one-way random effects model, respectively. Finally, Section 7 discusses our im- 
plementation and provides some general recommendations to practitioners. 

2. Exact sampling via a mixture distribution 

Suppose we want to explore the intractable probability measure TT{dx) defined 
on the measurable space (X,S(X)). Let P : X x B{X) — > [0,1] be a Markov 
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transition function and let X = {Xn}^^Q denote the corresponding Markov 
chain. Then for a: G X and a measurable set A, 

P{x, A) = Pr {Xn+i e A\Xn = x) . 

Assume that tt is an invariant measure for the chain; i.e., tt{A) = J.^ P{x, A) TT{dx) 
for all measurable A. Assume further that X satisfies the usual regularity con- 
ditions, which are irreducibility, aperiodicity and positive Harris recurrence. For 
definitions, see Meyn and Twecdic (1993) and Roberts and Rosenthal (2004). 
Finally, assume we are able to simulate the chain; that is, given Xn — x, we 
have the ability to draw from P{x, •). 

The exact sampling algorithm considered here utilizes a mixture representa- 
tion for TT (Asmussen et al., 1992; Hobert et al., 2006; Hobert and Robert, 2004), 
however, we must first develop the split chain. The main assumption necessary 
is that X satisfies a one-step minorization condition, i.e. there exists a function 
s : X — >• [0, 1] satisfying s{x) Tr{dx) > and some probability measure Q{dy) 
on (X,6(X)) such that, 

P{x, A) > s{x) Q{A) for all a; e X and A e B{X) . (1) 

Given X satisfies the one-step minorization condition at (1), then P can be 
decomposed as 

P{x, dy) = s{x) Q{dy) -|- (1 - s{x)) R{x, dy) , (2) 

where 

P{x,dy) " s{x)Q{dy) 

R{x,dy) = — , 

1 — s[x) 

and define R{x, dy) = if s{x) = 1. It is helpful to think of (2) as a mixture of 
two Markov transition functions with probabilities s{x) and 1 — s(x). Equation 
(2) shows that it is possible to simulate Xnj^i given X„ = a; as follows: Flip a 
coin (independently) that comes up heads with probability s{x). If the coin is 
a head, take Xn+i ~ Q{-)', if it's a tail, take Xn+i ^ R{x, •). 

This decomposition has several important applications in MCMC. Indeed, it 
can be used to perform regenerative simulation (Hobert et al., 2002; Mykland 
et al., 1995) and to derive computable bounds on the convergence rate of X 
(Lund and Tweedie, 1996; Roberts and Tweedie, 1999; Rosenthal, 1995, 2002). 

Now consider a new Markov chain that actually includes the coin flips men- 
tioned above. Let X' = {{Xn,Sn)}^^Q be a Markov chain with state space X x 
{0, 1}. If the current state is (X„, 5„) = {x, 6), then the next state, {Xn+i,Sn+i), 
is drawn as follows. U d = 1, then X„+i ^ Q{-)', while if (5 = 0, Xn+i ~ R{x, •). 
Then, conditional on Xn+i = x' , dn+i ~ Bernoulli(s(x')). This chain is called 
the split chain. Equation (2) implies that, marginally, the sequence of X„ values 
in the split chain has the same overall probability law as the original Markov 
chain X (Nummelin, 1984, Chapter 4). Note that, if (5„ — 1, then the distribu- 
tion of {Xn+i, Sn+i) does not depend on x. 
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Remark 1. We can avoid drawing from Q{-) entirely by changing the order 
shghtly. Given x is the current state, we can simply generate Xi^i ^ P{x, •) in 
the usual manner and then generate 5i\Xi, XiJ^i with 

Pr[6i = l\Xi,X^+i) = —— TTT^ , 3 

where q{-) and k{-\x) are the densities corresponding to Q{-) and P (Nummelin, 
1984, p. 62). 



2.1. A mixture representation of n 



The reason for introducing the split chain is that it possesses an accessible atom, 
Xx {1}. Indeed, each time the set Xx {1} is entered, the split chain stochastically 
restarts itself (because the next X„ has distribution Q). Let N = {1, 2, 3, . . . } 
and {Xq, 6q) £ X x {1}, then define the first return time to the atom as 

r min{n G N : (X„,,5„) G X x {!}} . 

Our assumptions about P imply that E(r) < oo and hence the sequence {pn}'^=i 
defined by 

_ Pr(r > n) 
^" " E(r) 

is nonnegative, nonincreasing, and sums to one. Let T denote a discrete random 
variable on N with Pr(T — n) — p„; also let Qn be the conditional distribution 
of Xn given that the split chain does not return to the atom before time n. Thus 
for any n G N and any measurable A, Qn{A) — Pr(X„ G A \ t > n). (Note that 
Qi = Q.) Then tt can be written as the following mixture of the Q„ values: 

CO 

■n[dx) = ^ p„ Qn{dx) . (4) 

n=l 

Remark 2. The representation at (4) can be obtained from results in Asmussen 
et al. (1992) by applying their methods to the split chain. Alternatively, Hobcrt 
and Robert (2004) obtain the representation when s{x) has the specific form 
elc{x) and Hobcrt ct al. (2006) obtain the representation with the more general 
minorization shown here. 

The representation at (4) offers an alternative sampling scheme for tt. First, 
make a random draw from the set {Qi, Q2, Qa, • ■ • } according to the prob- 
abilities pi,p2,P3, ■ ■ ■ and then make an independent random draw from the 
chosen Qn- 



Sampling algorithm for vr: 

1. Draw T such that Pr{T = n) = pn for n = 1,2,3,..., call the result t. 

2. Make a draw from Qt{ )- 
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Drawing from Q„ is simple even when n > 2. Indeed, just repeatedly sim- 
ulate n iterations of the split chain, and accept Xn the first time that Si — 
■ ■ ■ — Sn-i = 0. The challenging part of this recipe is drawing from the set 
{Qi, Q2, Qa, ■ • ■ }, i-e. simulating a random variable T, since the Pn values are 
not computable. Hobert et al. (2006) and Hobert and Robert (2004) approxi- 
mate the Pn values, which yields approximate draws from T and thus tt. In this 
paper, we obtain exact draws from T that result in exact draws from tt. 

Remark 3. Let K\T = nhe the number of simulations of the split chain before 
we get a draw from Qn, conditional on T = n from Step 1 above. Then K\T = n 
is geometric with mean E{K\T — n) — 1/P{t > n). Unfortunately, Blanchct 
and Mcng (2005) show E{K) — 00, and justifiably argue (4) should not be used 
for multiple replications. This presents a major challenge in the applicability 
of our algorithm and others that can be similarly expressed, some of which are 
discussed in the next section. 



2.2. Rejection sampler for T 

There is one case where simulating T is simple (Hobert et al., 2006; Hobert and 
Robert, 2004). Suppose that in the minorization condition (1), s{x) = e > for 
all a; g X (implying the Markov chain is uniformly ergodic) and consider the 
procedure for simulating the split chain with this constant s. In particular, note 
that the coin flip determining whether 6n is or 1 does not depend on x, and it 
follows that the number of steps until the first return to the accessible atom has 
a geometric distribution. Indeed, Pi{t > n) = {1 — e)"^^. Hence, E(t) = 1/e 
and 

Pn = Pr(T = n) = ^^^0f^ = e{l - e)-' , 

so T also has a geometric distribution. Therefore it is easy to make exact draws 
from TT. 

Hobert and Robert (2004) show this exact sampling algorithm is equivalent 
to Murdoch and Green's (1998) Multigamma Coupler and to Wilson's (2000) 
Read-Once algorithm. It is interesting that (4) can be used to reconstruct perfect 
sampling algorithms based on coupling from the past despite the fact that its 
derivation involves no backward simulation arguments. Of course, this exact 
sampling algorithm will be useless from a practical standpoint if e is too small. 

Unfortunately, in statistical inference problems, the MCMC algorithms are 
usually driven by Markov chains that are not uniformly ergodic and, hence, 
cannot satisfy (1) with a constant s. Moreover, there is no efficient method to 
simulate T where s is non-constant. (When s is non-constant, the distribution 
of T is complex and its mass function is not available in closed form. Hence, the 
mass function of T is also unknown, which precludes direct simulation of T.) 
Therefore we must resort to indirect methods of simulating T. 

Fortunately, simulating r is trivial — indeed, one can simply run the split 
chain and count the number of steps until it returns to X x {1}. Because this 
provides an unlimited supply of i.i.d. copies of t, we can use a rejection sampling 
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approach (Asmussen et al., 1992; Blanchet and Meng, 2005) to simulate T from 

the i.i.d. sequence ri, T2, . . . (where ri = r). 

Suppose there exists a function d : N — > [0, 1] such that X^i^i din) = D < oo 
and P{t > n) < Md{n) where M is a finite, positive constant. Consider a 
rejection sampler with candidate mass function d{-)/D. Thus 

Pr(T = n) _ Pr(r > n)/E(T) _ D Pi{t > n) D 
d{n)/D ^ d{n)/D ^ E(7) d(n) " E(7) ' ' 

which justifies the following rejection sampler. 

Rejection sampler for simulating T: 

1. Draw T - Call the resuh n and let a = 1/ [Md{n)]. 

2. Draw an independent Bernoulli random variable, B, with success proba- 
bility aPr(T > n). If i? = 1, accept T — n; it B = 0, return to Step 1. 

Unfortunately, the standard method of simulating B (by computing aPr(T > 
n) and comparing it to an independent Uniform(0, 1) random variable) is not 
available to us because the mass function of t is unavailable in closed form. 
However, we may draw B without knowing the value of Pr(r > n) using a 
supply of i.i.d. copies of r. This is the basis of our exact sampling approach. 

Suppose a G (0, 1] and let p = Pr(r > n), then there exists a simple solution 
to generate B ^ Bernoulli(ap), which Fill (1998) calls "engineering a coin flip". 
Indeed simulate a single r and define 



W 



1 if r > n 
if T < n 



hence W ~ Bernoulli(p). If we independently simulate V ^ Bernoulli(a) as 
usual and set B = VW, then 

Pr {B = l)= Pr {[V = l]n[W ^ 1]) = Pr {V = 1) Pr {W ^ 1) = ap . 

That is, B ^ Bernoulli(ap), obtained by simulating a single t and a single 
Bernoulli V. 

When a e (1, oo), we will obtain B ^ Bernoulli(ap) via the Bernoulli factory 
described in Section 3. For now assume such a simulation is possible, then what 
remains to establish is a computable tail probability bound, i.e. the sequence 
d{n) and the constant M. 



2.3. Tail probability bound 



Blanchet and Meng (2005) bound the moments of r, however they do not ex- 
plicitly determine computable values M and d{n). Fortunately, Pr(T > n) can 
be bounded above by a known constant times a known geometric mass function 
if X satisfies a geometric drift and associated one-step minorization conditions. 
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We will say a drift condition holds if there exists some function V : X i-^ [1, oo), 
some < A < 1 and some & < oo, such that 



E [V{Xi+i)\X, = x] < XV{x) + I(^^ec)b for all a; G X 



(5) 



In addition, we require the associated one-step minorization condition as follows; 
assume that s{x) is bounded below by e on C and that 



Pr{x, A) > eQ{A) for aU a; e C, Ae 6(X) 



(6) 



Hobert and Robert (2004) provide the following bound on Pi{t > n) based 
results in Roberts and Twcedie (1999). Define A = supj-g^; E [V{Xi^i)\Xi — x], 
J ^ {A- £)/A, and 




log A log (1 



if J < 1 , 
< A"^ if J > 1 . 



Then letting 0(/3) = log 
Pr(T >n)<l3 



logJ-log(l-£) 
logA-i, if /3 e (l,/3*), we have 



b ' 


0(/3) 


1 -/?(!-£) 


[e{l-\)\ 




l~{l-e) (J/(l -£))*(« 



/?-" (7) 



= Md{n) , 
where d{n) = /3~" and 

M = /3 



b 




l-/3(l-£) 


[eil-X)\ 




l-{l~e) (J/(l-e))*(« 



Note Yl^=i = ^2"^=! /^~" — phj = D < oo since (3 e (1, /3*). Having estab- 
lished the inequality in (7), we next detail how to generate B ~ Bernoulli(ap) 
when a > 1. 



3. Bernoulli factory 

Given a sequence W — {Wn}n>i of i.i.d. Bernoulli(p) random variables, where p 
is unknown, a Bernoulli factory is an algorithm that simulates a random variable 
B ~ Bernoulli(/(p)), where / is a known function. For the exact sampling 
algorithm, we require a Bernoulli factory where f{p) = ap. This idea arrose in 
Asmusscn et al. (1992) when proposing an exact sampling algorithm for general 
regenerative processes. 

Consider f : S t-^ [0, 1], where S C (0, 1). Kcanc and O'Brien (1994) show is 
it possible to simulate a random variable B ^ Bernoulli (/(p)) for all p e S if 
and only if / is constant, or / is continuous and satisfies, for some n > 1, 

min{/(p),l-/(p)}>min{p,l-p}" Vp G 5 . (8) 

While Keane and O'Brien (1994) develop the necessary and sufficient conditions 
on /, they do not provide a detailed description of an algorithm. Nacu and Peres 
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(2005) suggest a constructive algorithm via Bernstein polynomials for fast sim- 
ulation, i.e. the number of input Bernoulli(p) variates needed for the algorithm 
has exponentially bounded tails. However, we find no practical implementation 
since it requires dealing with sets of exponential size. Our approach is based on 
the recent work of Latuszynski et al. (2011), which avoids keeping track of large 
sets by introducing a single auxiliary random variable. 

The general approach, in the formulation of Latuszynski et al. (2011), is 
to construct two random approximations to f{p), denoted J7„ and L„, which 
depend on Wi , W2 , . . . , W„ and satisfy 

l>Un = UniWi,. ..,W„) > Un+l > > i„ = i„(W^l, ■ ■ . ,Wn) > a.S. 

(9) 

The random variables Un and L„ approximate /(p) in the sense that E{Un) 
f{p) and E{Ln) f{p) as n — > oo. The decision to continue sampling or output 
a zero or a one in the Bernoulli factory is made using an auxiliary Uniform(0, 1) 
variable. 

Remark 4. The almost sure monotonicity requirement in (9) is typically dif- 
ficult to attain and thus Latuszynski et al. (2011) relax it by using super/sub- 
martingales instead. 



3. 1 . Modified target function 

For the rejection sampling approach to simulating T, we have the ability to 
simulate W by setting Wi = I{Ti > n) for i > 1. We require a single B ^ 
Bernoulli(ap), where a — l/[Md{n)] is a known constant such that a > 0. The 
outcome B determines if we accept or reject the proposed value. For a e (0, 1] 
we use the simple solution in Section 2 and for a G (1, cxi) we use the Bernoulli 
factory. 

Unfortunately, the function f{p) — ap on (0, 1/a) does not satisfy (8) and 
cannot be simulated via the Bernoulli factory. However, when restricted to 
f{p) = min{ap, 1 — w} for a; > 0, such a simulation is possible. 

Nacu and Peres (2005) and Latuszynski et al. (2011) provide a detailed al- 
gorithm for a = 2 and < a; < 1/4. Their construction requires a minimum 
of 65,536 input variables (see Table 1) before the requirement C/„ < 1 at (9) 
is met. This is due to the fact that f{p) — min{2p, 1 — is not differentiable 
and the Bernstein polynomials can approximate general Lipschitz functions at 
a rate of l/y/n (sec part (i) of Lemma 6 from Nacu and Peres, 2005). However, 
when the target function / is twice differentiable, the rate increases to 1 /n (see 
part (ii) of the same Lemma). 

This suggests the number of Bernoulli (p) input variates required may decrease 
significantly by using a twice differentiable /. With this in mind, we propose 
extending ap smoothly from [O, to [0,1]. Fix S < uj and consider the 

following function 



F 



0,1-i^ 



fap/S 

[0,(5) F{p)^S I e-''dt 
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which is bounded by S and twice difFerentiable, with F'{p) — aexp{—a^p^/S^} 
and F"{p) ^ -2pf^ exp{-a^p'^ /S^} < 0. Standard calcuhis also gives \F"{p)\ < 
c? {F is related to the Gauss error function (erf), though it can be simply 
calculated from a standard normal distribution function.) 
Then define our target function / as 



f{p) 




if p e 
if p e 



a 

UJ 



(10) 



In other words, we have extended ap such that / defined at (10) is twice dif- 
ferentiable with |/"| < C = a^j^- Define a{n,k) = f{k/n) and h{n,k) = 
a{n, k) + C/ (2ri) using / at (10), then we can state Algorithm 4 of Latuszynski 
et al. (2011) with our modification. 

Algorithm I. 

1. Simulate Go ^ Uniform(0, 1). 

2. Compute m = min{m e N : 6(2", 2™) < 1}. Set n = 2", lii = and 

U:l = 1. 

2 

3. Compute i/„ = SLi = -^") ^"^'^ = ^('^' -^n)- 

4. Compute 



Ln-Y. — — n2'V ^-d^"-E — m — H2'V 



n — 










' n \ 


( 










T * 



i=0 



5. Compute 

L„ = + — ^ ( C/n - Lii ) and J7„ = t/.^ - ^ {U^-L^ 



n ~ 




Hn 








( 




^ n 




^ n 





6. If Go < Ln set B = 1; if Go > C/„ set B ^ 0. 

7. If L„ < Go < Un, set n = 2n, return to step 3. 

8. Output B. 



Theorem 1. Suppose a, 5 and u are constants such that < a < oo and 
< 5 < UJ < 1. Further suppose f{p) as defined at (10), ap e [0, 1—uj] and W = 
{I^n}n>i 'fe i.i.d. Bernoulli(p). Then Algorithm I outputs B ~ Bernoulli(ap). 
Moreover the probability that it needs N > n iterations equals a'^ /nSy/2e. 

Proof. See Appendix A. □ 

Remark 5. Note that the probability Algorithm I needs N > n iterations 
is independent of the unknown value of p. Hence the number of Bernoulli(p) 
variates required will also be independent of p. 
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Table 1 

Comparison of count of input Bernoulli{p) variates to implement the Bernoulli factory. 



a 


2 (N&P) 


2 


5 


10 


20 


Minimum 


65536 


256 


2048 


8192 


32768 


Mean Count 


65536 


562.9 


2439.8 


10373 


43771 


S.D. Count 





21046 


7287.6 


54836 


3.908e5 



Algorithm I provides a constructive algorithm for a e (1, oo) and reduces the 
number of input variables by a factor of over 100, which we demonstrate in the 
following example. This is a critical improvement since the Bernoulli factory ac- 
counts for most of the computational demands of the exact sampling algorithm. 

3.2. Bernoulli factory example 

Consider generating 10,000 Bernoulli(ap) variates for various values of a while 
setting p = 0.01, uj = 1/5 and S = 1/6. Table 1 displays the minimum number of 
Bernoulli(p) variates required, along with the observed mean and standard devi- 
ation for the count of Bernoulli(p) variates used to generate 10,000 Bernoulli(ap) 
variates. We can see for a = 2 the minimum and observed mean have been 
reduced substantially when comparing our target function with the Nacu and 
Peres (2005) target (N&P). The reduction in observed input 
Bernoulli(p) variates represents a 120 times reduction in computational time. 
The N&P implementation always stops the simulation at the minimum, and 
hence the standard deviation of the count is 0. Table 1 also shows the input 
variates required increases as a increases. Simulations for other p values (not 
shown) provide very similar results. 

4. Exact sampling algorithm 

The Bernoulli factory algorithm. Theorem 1, requires ap < 1 
let K > 1 such that 1/k < 1 — uj and from (7) 

Pr(T > n) < Md{n) < Md(n)K . 

Then letting a — 1/ [Md{n)K\ we have aPr(T > n) < 1/k < 1 
The following algorithm results in exact draws from tt. 



Exact sampling algorithm for tt: 

1. Draw T* - Geometric (1 - 1//3), i.e. Pr (T* = k) = {l/pf^^ (1 - l/P) 
for fc = 1, 2, . . . , and call the result n. Set a = 1/ [M(i(n)«;]. 

2. If a < 1, draw a single t random variable. Let W = I{t > n) and inde- 
pendently draw V ~ Bernoulli (a). Set B = VW. 

3. If a > 1, use the Bernoulli factory to obtain B, a single Bernoulli random 
variable with success probability p = aPr(T > n). 

4. If B = 1, accept T* = n; li B = 0, return to Step 1. 

5. Make a draw from Q„(-). 



— cj. To this end, 



— w < 1 for all n. 
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The algorithm requires selection of e, C, A, and k from a range of possible 
values (depending on the drift and minorization) . Further, /? g (l,/3*) must 
be selected depending on the previously selected parameters. Each selection 
impacts the algorithm performance and we suggest investigation of different 
settings for a given example. Our examples in Sections 5 and 6 discuss hyper- 
parameter selection and provide further recommendations. 



5. Toy examples 

This section contains two toy examples in the sense that we can obtain i.i.d. 
samples for each, hence there is no practical reason for considering MCMC 
algorithms. The purpose is to gain insights into the exact sampler and study its 
performance. 



5.1. Metropolis -Hastings example 

This section illustrates the exact sampling algorithm for a Metropolis-Hastings 
sampler. Suppose that X — [0, oo) and Tr{dx) — fx(x) dx where fx{x) — 
e"^ I{x > 0). Consider the function V{x) — e^^ for some c > and sup- 
pose we use a Metropolis sampler with a symmetric proposal density g{-\x)^ 
which is supported on [x — 7, x -t- 7], 7 > 0. Then, for x > 7, 



E\V(X,^r)\X,=x\= f V{y)g{y\x)dy+ f ^\{y)g{y\x)dy 



-7 

f f(v) 
V{x)g{y\x)dy(l--^^y' 



M 

fix) 



fix), 

V(2x ~y) + V{y) ^ + V{x) {l g(^y\^)dy 
= V{x) (e-"(^~") + e("-i)(^-") + 1 - e-(^-")) giy\x)dy. 

Selecting g to be the uniform density giy\x) — i^Iy(z[x-j,x+'y], we get 



27 

7 



E[ViX,+,)\X, = x] = V{x)^ [ (e-- + e(=-i)- + 1 - e'^) 



dz. ill) 



27 7o 

When X G [0,7], 

fix) 

V{x)g{y\x)dy(l ^^^^ ^ 



px /'X-\-'y 

E[V{X,+,)\X,^x]^ V{y)g(y\x)dy+ V{y)g{y\x)dy 

jQ Jx 

fM\ 

fix)) 



■ 1 cx i-x+l 

e''y — dy+ — / e("-i)(^-^)dy 

27 27 7, 



27 



(1 - e-(^-^)) dy . (12) 
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When combined, (11) and (12) obtain a drift condition for the Metropohs- 
Hastings sampler considered. However, the selection of the constants c and 7 
is crucial to obtaining a reasonable computation time. The constants /3 and 
M, which are described in Section 2.3, also depend on c and 7. Based on our 
example in Section 3, our strategy is to maximize /3, which in turn results in 
small values for a = l/[Md{n)K]. Figure 1(a) displays a contour map of f3* for 
c < 0.3 and 7 < 10. Based on this plot, we select c = 0.028 and 7 = 4, resulting 
in /3 = 1.0243 (as seen below). Evaluating the integrals in (11) and (12) gives 
the drift condition 

E[ViX,+i)\X, ^x]< XV (x) + I^^^c)b 

where A = 0.977, h = 0.1 and the smaU set C = [0,4]. The interval [0,7] is 
indeed a small set, since, when x G [0,7], 

P{x,dy) > g(y|a;)dymin |l, ^/j^g[o^x+7]C^yniin |l, 



1 
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This establishes the necessary minorization condition ^(2;, dy) > s{x)Q{dy) 
where 

1 _ 1 - e^'< 

Q{dy) = q{y)dy = _^ _ ^_^ e ''/j^g[o,^]dy and s{x) = — — — 4e[o,7] ■ 

The remaining numerical elements required for the geometric probability bound 
are: A = sup^g^ -^^[^(-^i+il-^i = x]) = 1.09197 and J = iA-e)/X = 0.99283 < 
1. Then /3 = 1/X — 1.0243. Finally, the Bernoulli factory hyper-parameters are 
K — 5/4: and 6 = 1/6 resulting in cj = 0.2. 

Following Mykland ct al. (1995), we can now simulate the split chain as fol- 
lows: (1) draw Xn+i from g{-\Xn) and accept it with probability minjl, j-^^) ^ } i 
(2) if the candidate in step (1) is rejected, set (5„ = 0, otherwise, generate i5„ as 
a Bernoulli variate with success probability given by 

Pr{S,, = 1) - s{X„MXn+,) 



5(X„+i|X„)min{l,^i§±i)} ■ 

Using the exact sampling algorithm, we generated 1000 i.i.d. Exp(l) random 
variates. Figure 1(b) shows a Q-Q plot of the observed draws versus the the- 
oretical quantiles of the Exp(l) distribution. During our simulations none of 
the proposed T* values which required the use of the Bernoulli factory were 
accepted. This is due to the fact that our Metropolis-Hastings sampler regener- 
ates very fast, roughly in about 20 moves. Thus for large T* , when the Bernoulli 
factory is necessary, the probability Pr(T > T*) is negligible and the Bernoulli 
factory outputs a zero. Improvements via modified drift and minorization or 
hyper-parameter selection may improve this situation. 
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(a) Contour map of (b) Q-Q plot. 

Fig 1 , Plots for Metropolis- Hastings example. 

5.2. Gibbs example 

Suppose Yi\fj,,0 ^ N(/i, 0) independently for i = l,...,m where m > 3 and 
assume the standard uwariant prior v{^^9) oc 0^^/"^. The resulting posterior 
density is 

e\y) cc exp {-^{s' + {y - ^^r)} (13) 

where is the usual biased sample variance. It is easy to see the full conditional 
densities, f{^\9,y) and f{0\^,y), are given by ^J,\0,y ~ N{y,0/m) and 0\fi,y ~ 
IG((m — l)/2, m [s^ + {y — m)^] /2), hence a Gibbs sampler is appropriate. (We 
say W ^ IG(q!,/3) if its density is proportional to ■w~^°'^^^e~^^'^I{w > 0).) We 
consider the Gibbs sampler that updates 6 then /i; that is, letting x' — (6'',/i') 
denote the current state and x {6, ^i) denote the future state, the transition 
looks like {0',ft') — > {0,IJ-') — > (^,m)- This Markov chain then has state space 
X = K+ X M and transition density 

k{e,fi\e',^l') = f{9\^l',y)f{^l\0,y) . (w) 

Appendix B provides a drift and minorization condition using a small set in the 
form 

C = {ie,fi) e M+ X M : V{n,e) < d} , 

where V{6, fi) — 1 + (fi ~ y)^ . 

Suppose y = 1, = 4, and m — 11. We set A = 0.5 and d ~ b/{\ — 
(m - 3)-^) = 11/3 resulting in e = 0.5750034 and /3* = 1.3958. Given the 
allowable range /? G (1, /?*), we select f3 = 1.35, which turns out to be extremely 
important for implementation. Selection of /3 very close to (3* or close to 1 
seems to cause problems for the Bernoulli factory because of large constant 
multipliers a = 1/ [Md(n)K] given typical proposed T* = n (which depend on 
/3). Experimentation has shown us values somewhat close to (3* seem to provide 
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Table 2 

Summary constants for Gibbs sampler example with j3 = 1.35. The row min refers to the 
minimum number of observed ts to implement the Bernoulli factory and p{n) = P (T* = n) . 



n 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


p{n) 


0.259 


0.192 


0.142 


0.105 


0.078 


0.058 


0.043 


0.032 


0.023 


0.017 


a 


0.08 


0.11 


0.14 


0.19 


0.26 


0.35 


0.47 


0.64 


0.86 


1.16 


min 




















128 


n 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


p(n) 


0.013 


0.010 


0.007 


0.005 


0.004 


0.003 


0.002 


0.002 


0.001 


0.001 


a 


1.57 


2.12 


2.87 


3.87 


5.22 


7.05 


9.52 


12.85 


17.35 


23.42 


min 


128 


256 


512 


1024 


2048 


4096 


8192 


8192 


16384 


32768 



the best results. Finally, the Bernoulli factory hyper-parameters are k = 5/4 
and 6^1/6 resulting in a; = 0.2. 

Table 2 summarizes the resulting constants given the hyper-parameter choices. 
Notice the Bernoulli factory will be necessary for proposed values greater than 
or equal to 10, that is with probability 0.067. The constants for values greater 
than 20 are not listed. However, these values are of interest since the minimum 
number of observed r values becomes extremely large. While values in this range 
are uncommon, P (T* > 20) « 0.002, they occur with enough frequency to slow 
down the algorithm substantially. 

The exact sampling algorithm was used to generate 1000 exact draws from 
the posterior density at (13). Generating the 1000 draws required approximately 
35 hours of computational time, about 2 minutes per draw. A total of 27,665 T* 
values were proposed of which 69 (0.25%) were greater than 20. Implementing 
the Bernoulli factory required 1.52e9 r values, or 1.52e6 r values per exact draw. 
However, most of the computational time, and necessary r values, were used 
for a small number of proposed T* values. Similar to the Metropolis-Hastings 
example, none of the accepted T* values were from the Bernoulli factory. 

The largest proposed T* value was 37 with a 3848 requiring 1.07e9 r val- 
ues (of which were > 37) to implement the Bernoulli factory. This value alone 
accounted for about 70% of the total number of r values, and hence about 70% 
of the total computational time. It should be noted that the largest accepted T* 
value was 9, so proposals unlikely to be accepted account for most of the compu- 
tational demands. Removing only the largest proposal, the remaining 999 draws 
required approximately 10 hours of computation, or about 35 seconds per draw. 

Alternatively for this example, we can sequentially sample from (13) to obtain 
i.i.d. draws (Flegal et al., 2008). Figure 2 compares the 1000 exact draws to 1000 
i.i.d. draws from a sequential sampler using a Q-Q plot for both fi and 6. We can 
see from these plots that the exact sampling algorithm is indeed working well. 

6. Bayesian random effects model 

This section considers a Bayesian version of the one-way random effects model 
given by 

Yij ^(p, + (ij, i^l,...,q, j = l,...,mi 
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Fig 2. Q-Q plots comparing 1000 exact draws to 1000 i.i.d. draws from a sequential sampler. 



where the random effects (j)i are i.i.d. N{^,a'^) and independently the errors 
are i.i.d. N{0,a'^). Thus (/i, cr^jCTg) is the unknown parameter. 

Bayesian analysis using this model requires specifying a prior distribution, 
for which we consider the family of inverse gamma priors 

where ai, 02, /3i and (32 are hyper-parameters. If we let y — {yij} and (p — {(j>i} 
denote the vectors of observed data and random effects respectively, then the 
posterior density is as follows 

TT {<l>, f^, 0-0, C^e) « / (yl'/'. '^l^ <^l) f <^l, CTe) <^l, <^l) > (15) 

where 

9 m, „i r 1 1 

and 

/(<^|/i,a^,a2) = J](27ra2)-'exp|-^(0,-M)'| • 

For ease of exposition, we will suppress the dependency on the data y and define 
the usual summary statistics: yi = Vij-> = Hi V = ^'^^^ Hi Hj Vij^ 

SST = J2i iVi - yf and SSE = J2^ T,j iVij - V^f ■ 

We consider a block Gibbs sampler that updates 9 = (cr^, al) then ^ = (^, 0), 
that is {9' , — > (6*, — > (0, ^). The necessary full conditionals can be obtained 
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via manipulation of (15). That is, f{0\^') is the product of two inverse gammas 
such that 

cr^lC ^ IG \-+ ai, — ^ h/3] 



and 



rr\^ ^r. MC)+SSE 

O^el? IG[ — + Q!2, ^ hfe 



where wi(C) = ELi(<?^i " A*)^ and W2{C) = - Vif ■ Further, /(^|6I) 

is multivariate normal density whose parameters are given in Tan and Hobcrt 
(2009). This Markov chain then has state space X = x and transition 
density 

o\ r, e') = f{e\e)fm = f{^i\Of{^i\C)fm . 

Implementation of the exact sampling algorithm requires a drift and associ- 
ated minorization condition as at (5) and (6). Hobert and Geyer (1998), Jones 
and Hobert (2001, 2004) and Tan and Hobert (2009) analyze variations of the 
proposed block Gibbs sampler, however none obtain sufficient constants for a 
practical implementation of our algorithm. To this end, the following theorem 
improves upon the drift constants of Tan and Hobert (2009) for a balanced 
design while using a simplified version of their drift function. 

Theorem 2. Let mi — m for all i = 1, ... ,17 and let A2 = 1 — l/[q{m + 
1)] +max{g(m + l)/m2 ,1/m}. Further let V : Wi+^ x ^ [l,oo) such that 
y(^, 6) = K + (5iwi(^) + 52W2{0 where K >l, 5i > and 82 > Q and define 



X* 



1 SiA2/S2 + q + l 



2ai - 2 ' M + 2oL2 - 2 



Then there exists K > 1, Si > and 62 > such that A* < 1 and (5) holds. 
That is, for any A G (A*, 1), 

E{Vi^,e)\^\9') < Anr,0 + W(e,fl')ec (16) 
2Si(3i , {SiA2 + 52{q + l))iSSE + 2p2) 



where 



b = K{l-X) + 



q + 2ai-2 M + 2a2 - 2 



q 

{61 + mS2)J2iy^ - y f 

i=l 



and 



C^{i^,e)e M^+i X Rl : V{t 9) < d} 
where d — b/(X — X*). 

Proof. See Appendix C. □ 
The drift condition still holds if we increase the small set to 

C = {(e, 9) e M-'+i xRl : K + diwi{0 < d, K + ^2^2(0 < 4 , 
for which Appendix C provides the associated minorization condition. 
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6.1. Styrene exposure dataset 

We will implement the exact sampling algorithm using the styrene exposure 
dataset from Lylcs ct al. (1997) analyzed previously by Jones and Hobert (2001) 
and Tan and Hobert (2009). The data, summarized in Table 3, is from a balanced 
design such that = m = 3 for i = 1, . . . , g, g = 13, and M — mq — 39. 

We consider prior hyper-parameter values ai = a2 = 0.1 and Pi = — 10. 
The drift function at (16) requires specification of drift parameters K — 50, 
A2 = 6.7585, i5i = 1, and 82 — \ which results in A* ~ 0.5580. We then choose 
A = 0.97, b = 37.88927 and d = 91.96992, resulting in e = 0.01269784. 

Using these settings /3* = 1.000092 and we choose (3 = 1.000083 resuhing in 
M = 10.19413. We again used Bernoulli factory hyper-parameters of k = 5/4 
and (5=1/6 resulting in = 0.2. In this case, the Bernoulli factory is necessary 
for proposals greater than 30,706, approximately 8% of proposed values. It is 
extremely likely the output from the Bernoulli factory will be zero since a sample 
of 1000 i.i.d. T values yielded only a maximum of 1278. 

The exact sampling algorithm was run until we obtained a 20 i.i.d. draws 
from the posterior at (15) which took 31,887 proposed T* values and 2.61e8 r 
values for the Bernoulli factory. The accepted T* values and i.i.d. 9 values are 
listed in Table 4. Notice the maximum accepted T* was 287, which is well below 



Table 3 

Styrene exposure data summary statistics. 



Worker 

m 



1 

3.302 



2 3 4 5 

4.587 5.052 5.089 4.498 



6 

5.186 



7 

4.915 



Worker 



8 

4.876 



9 10 11 12 

5.262 5.009 5.602 4.336 



13 
4.813 



y = 4.089, SST = 11.430, SSE = 14.711 



Table 4 

List of 20 i.i.d. 9 draws from the posterior at (15) with the accepted T* values. 



Draw T* cr^ al 



1 145 2.477 1.2624 

2 18 1.234 1.7698 

3 286 3.058 1.8791 

4 40 2.607 1.2079 

5 76 2.177 2.0603 

6 287 6.513 1.2870 

7 39 5.961 1.5295 

8 103 1.642 1.2093 

9 194 2.150 1.8129 

10 195 2.112 1.4871 

11 101 1.101 1.7166 

12 2 1.659 1.0805 

13 5 5.544 1.2856 

14 9 1.505 1.6600 

15 1 2.105 1.4137 

16 150 2.681 0.7317 

17 63 3.131 1.1506 

18 64 3.514 1.8245 

19 52 2.119 1.1743 

20 62 3.571 1.5009 
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our observed maxinium of 1278 from 1000 i.i.d. r values. Hence, drawing from 
Qn was easy and almost all of the simulation time was used for the Bernoulli 
factory. Obtaining the 20 i.i.d. draws required 24 days of computational time 
utilizing six processors in parallel (equating to 144 days on a single processor). 

7. Discussion 

This paper describes an exact sampling algorithm using a geometrically ergodic 
Markov chain on a general state space. The algorithm is applicable for any 
Markov chain where one can establish a drift and associated minorization with 
computable constants. The limitation of the method is that the simulation time 
may be prohibitive. 

Blanclict and Thomas (2007) implement an approximate version using the 
Bayesian probit regression example from van Dyk and Meng (2001) with regen- 
eration settings provided by Roy and Hobert (2007). This example is ill-suited 
using the proposed algorithm because of computational limitations related to 
the Bernoulli factory and in obtaining a practical s. Specifically, we found (in 
simpler examples) obtaining a single draw from tt sometimes required millions of 
i.i.d. r variates. Unfortunately, even using non-constant s(a;), the probit example 
requires about 14,000 Markov chain draws per t (Flegal and Jones, 2010). Hence 
obtaining a single draw from tt would require an obscene number of draws from 
X. Implementation for more complicated Markov chains, such as this, likely 
requires further improvements, or a lot of patience. 

Careful analysis of the Markov chain sampler is necessary to find useful drift 
and minorization constants. Most research establishing drift and minorization is 
undertaken to prove geometric ergodicity, in which case the obtained constants 
are of secondary importance. However, performance of the exact sampling algo- 
rithm is heavily dependent on these constants. Improving them may be enough 
to obtain exact samples in many settings. 

Alternatively, the speed of the overall algorithm would improve if one could 
find a bound using non-constant s{x) or a sharper bound with e. The current 
bound at (7) could potentially be modified upon by only considering specific 
models, or specific classes of models. 

Finally, one could obtain further improvements to the Bernoulli factory since 
it requires most of the necessary r variates. Our work has already obtained 
a 100 times reduction in computational time. However there may be further 
improvements available for the Bernstein polynomial coefficients, modifications 
to Algorithm 4 of Latuszynski et al. (2011) or an entirely different method to 
estimate /. Hyper-parameter settings also impact performance and could be 
investigated further. 
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Appendix A: Proof of Theorem 1 

Proof. By construction / is a smooth function / : [0, 1] — )■ [0, 1 — e] for some 
< e < oj — 6. Proposition 3.1 of Latuszynski et al. (2011) and Lemma 6 of 
Nacu and Peres (2005) prove existence of an algorithm that simulates f{p) if 

(i) / has second derivative /" which is continuous and 

(ii) the coefficients a and b satisfy 

„(2„..)(^;) > !:«(»,.)(")(;:.). (17) 

K2„..)(t) < i:k»..)(»)(,!,). (18) 

Condition (i) is clearly satisfied by construction, so it remains to check con- 
dition (ii). Since the coefficients a and b are defined through /, inequalities (17) 
and (18) will be checked using the properties of /. 

Recall b{n,k) = a{n,k) + C/(2n), the inequalities (17) and (18) above can 
be re-expressed (Nacu and Peres, 2005) as 

a{2n, k) > E{a{n, X)) and b{2n, k) < E(b{n, X)), 

where X is a hypergeometric random variable, with parameters (2n, fc, n). Using 
the definition of a, and the fact that / is concave, the first inequality is a 
direct application of Jensen's inequality. The second part is a straight forward 
application of Lemma 6 from Nacu and Peres (2005) and the properties of the 
hypergeometric distribution. 

Finally, the probability the algorithm needs N > n follows directly from 
definitions of the coefficients a and b and Theorem 2.5 of Latuszynski et al. 
(2011). □ 

Appendix B: Toy Gibbs drift and minorization 
B.l. Drift condition 

Let X = {Xn}n>o be the Markov chain corresponding to the Gibbs transition 
kernel given in (14). Recall X = K"*" xM., x' = {9',fj,') denotes the current state 
and X = {6, fi) denotes the future state. Jones and Hobert (2001) establish a 
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minorization and Rosenthal (1995)-type drift condition for to > 5, and hence 
prove the associated Markov chain is geometrically ergodic. Using their argu- 
ment we show a Roberts-and-Tweedie-type drift condition at (5) (Roberts and 
Tweedie, 1999, 2001) holds using the function V{x) = V{9,n) 1 + - yf. 
Conditional independence (based on the update order, see Jones and Hobert, 
2001) yields 

E[v{x,+l)\x, = x'] = E[v{0,^i)\e',^i'] 
= E[v{0,^l)\^^'] 
= E{E[vie,fi)\e]\t,'} . 

Since fi\6,y ^ N(y, 0/m), the inner expectation is 

E [V{0, ^l)\e] ^e\(i + {^i- yf) \0\=1+ Var (/i|0) = 1 + - . 

LV / J TO 

Then since 0\^l',yr^ IG((to - l)/2, to [s^ + (y - //)2] /2), 

TO [s2 + (y - ;/)2] 



and hence 



E [9\^i'] 



E[V{X,+^)\X,=x'] 



TO — 3 



1 + (m' - y) 



(19) 



TO — 3 ?n — 3 

Let A e ((to - 3)-i,l), 6 = (s2 + m-4) /(m-3), d > 6/ (A - (to - 3)-^) and 

C = {{6, ^) e M+ X M : V{n, 0) < d} , 
then the drift condition at (5) is satisfied, that is 

E [V{X,+i)\X, = x'] < XV{x') + I(^>ec)b for all x' eX. 
It is easy to see from (19) that 

A= sup E [V{X,+,)\X, =x]^ ^^P-'^^ ^^^'^ + b = + b . 

x'dc m - S TO - 3 



B.2. Minorization condition 

Now we establish the associated minorization condition at (6) using a similar 
argument to Jones and Hobert (2001). Let = {/.t G M : 1 + {fi — y)^ < d}, 
then for any fi' € 

k{e,fi\e',fi')^f{e\^i')f{fi\9)>f{fi\e) mi f{e\fi) . 
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Recall /(6'1/i) is an IG density, thus g{6) := inf^g^^^ /(^Im) can be written in 
closed form (Jones and Hobert, 2004; Rosenthal, 1996; Tan and Hobert, 2009), 



IG 
IG 



m— 1 ms . , 
2 ' 2 ' ' 



if 6* > 61* 



where 6* = m{d - 1) [(m - 1) log (l + ^)] ^ and IG{a,f3;x) is the inverse 
gamma density evaluated at x. If we further define 



f{fi\e) inf f{e\fi)dfid9= / inf f{e\fi)de 



and density q{9, ^) = £ ^ g{9) J {}i\0) , then 

k{9,^l\e',^i')>eq{9,^l) . 

Letting Q{-) be the probability measure associated with the density g, then the 
minorization condition from (6) holds, that is for any set A and any {9' , jj!) S C 

P{x,A) > eQ{A) for all A e B{X) . 

Notice the minorization condition holds for any d > 0. 
Simulating the split chain requires evaluation of (3), 



Pr (<5' = l|/i',0',/i,0) = 



kifi,9\fi',9') 
ee-'g{9) f{^i\9) 

9(0) 



™(s^+-f{t)<e-'}('i-l)) 
2 



(m-l)/2 



2 



g_(„_l)/2-l 

r((m-l)/2) '^'^P ■ 



(m-l)/2 



r((' 



(m-l)/2) \ [ 29 J / 



-| (m-l)/2 



• exp 



+ (y — 



29 



29 



Notice that Pr {S'\n', 9' , ^, 9) is free of e, 9', and ^. 
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Appendix C: One-way random effects drift and minorization 
C.l. Proof of Theorem 2 

Proof. Notice 

E{v{t e)\e, 0') = E{E{v{s„ e)\e)\^\ e'} (20) 

where the inner expectation becomes 

E{V{i, e)\6) + 5iE{wi{O\0) + 52E{w2mO) ■ 

Note that 



, 2 



9 If 

i=l j=l ^ 

For general designs (balanced or unbalanced) Tan and Hobcrt (2009) prove 

E Var(0, - ^l\e) < Aia| + A^a^ 

i=l 

where, 

Ai=minLfy^^^ \ g-"iax{mi,...,mj 

and 



A2 = — r + max < g ( 

^ TO,; A/(l + TOi) I TO; + 1 ^ 



M 



Simplifying Ai and A2 under the balanced design, we obtain Ai = 1 and 
A2 = 1 — l/[q{m + 1)] + max{q{m + 1)/to^ , 1/m}. 
From the full conditionals (Tan and Hobcrt, 2009) 



TOCT? 



ma'] 



then it follows that (with Ai = 1) 



1=1 
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Similarly, 

E{w2{0\o) < {q + l)<7l + mJ2iy^-y)' 



To complete the calculation in (20), recall cr^\C and crj|^', thus 

Ei4ia - ^^^^ 

and 



2 ^ W2{a + SSE + 2P2 
^ ''^^ ' M + 2a2 - 2 

It follows that 

2Ji/3i {5i/^2 + 52[(l + ^)){SSE + 2P2) 
q + 2ai-2 M + 2a2 ~ 2 

+ {Si + mS2)J2iy^-y)^ ■ (21) 

1=1 

Note there exists Si > and 62 > such that A* < 1, hence the drift condition 
at (5) is satisfied. □ 

For the exact sampling algorithm, we can see from (21) and the definitions 
of C and b that 

A ^ sup [ViX.,i)\X. ^ .] < + + _ . 

x'ec q + 2ai - 2 M + 2a2 - 2 

C.2. Minorization condition 

Next we show the associated minorization condition holds. The argument will 
be similar to the toy Gibbs example from Appendix B. Let = S W^^^ : 
1 + 5iWi{C) < rf, 1 + S2'W2{C) ^ d,}, then the associated minorization condition 
holds if we can find an e and q{^, 9) such that for any ^' e 

HLo\e,e') = fi<yi\afi'^i\afm 

>/(e|0) inf /(a^lO inf fiallO 

= /(el^)5l(^^)52(^e') 

= eqi^,e). (22) 
The two infimums can be found analytically as before: 

/ /G(|+a„^+A;a|) if < 
[ /G(|+ai,/3i;a2) if > a; 
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and 



IG 



2 + 0^2: 2 

IG (f +a2,^+/32 



{d-K)/S2+SSE 
SSE 



-ft 



if a? > 



The points cr^ and a* are the intersection points of the two inverse gamma 
densities determined by 

* 61-62 



a (log 61 - log 62) 



where a, 61 and 62 are the parameters of the two inverse gamma distributions. 
We can define 



fm9{0)d^d9= / g{9)de 



and density q{^,9) — e~^ffi(CT^)52(o'e)/('?|^), then (22) holds. Since this mi- 
norization condition holds for any d > 0, this establishes the associated mi- 
norization condition. 

Simulating the split chain requires evaluation of (3) similar to the calculation 
in Appendix B, 



' exp 



/Kie)/(^e^ie) 

/{,2<,.}(ci-X)/Ji + 2/3i 



2al 



2al 



h-l«}i'^ - + + 2fe 

W2{£,') + SSE + 
I{.l<ai}{d~K)/52 , W2{C) 



■ exp 



2^2 



2(7? 
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